智能论文笔记

A Novel Dataset and a Deep Learning Method for Mitosis Nuclei Segmentation and Classification

Huadeng Wang , Zhipeng Liu , Rushi Lan , Zhenbing Liu , Xiaonan Luo , Xipeng Pan , Bingbing Li

分类：计算机视觉 | 人工智能

2022-12-27

Mitosis nuclei count is one of the important indicators for the pathological diagnosis of breast cancer. The manual annotation needs experienced pathologists, which is very time-consuming and inefficient. With the development of deep learning methods, some models with good performance have emerged, but the generalization ability should be further strengthened. In this paper, we propose a two-stage mitosis segmentation and classification method, named SCMitosis. Firstly, the segmentation performance with a high recall rate is achieved by the proposed depthwise separable convolution residual block and channel-spatial attention gate. Then, a classification network is cascaded to further improve the detection performance of mitosis nuclei. The proposed model is verified on the ICPR 2012 dataset, and the highest F-score value of 0.8687 is obtained compared with the current state-of-the-art algorithms. In addition, the model also achieves good performance on GZMH dataset, which is prepared by our group and will be firstly released with the publication of this paper. The code will be available at: https://github.com/antifen/mitosis-nuclei-segmentation.

translated by 谷歌翻译

An Embarrassingly Easy but Strong Baseline for Nested Named Entity Recognition

Hang Yan , Yu Sun , Xiaonan Li , Xipeng Qiu

分类：自然语言处理

2022-08-09

命名实体识别（NER）是检测和对实体跨越文本的跨度的任务。当实体跨越彼此之间的重叠时，此问题被称为嵌套NER。基于跨度的方法已被广泛用于应对嵌套的NER。这些方法中的大多数都会获得分数$ n \ times n $矩阵，其中$ n $表示句子的长度，每个条目对应于跨度。但是，先前的工作忽略了分数矩阵中的空间关系。在本文中，我们建议使用卷积神经网络（CNN）对分数矩阵中的这些空间关系进行建模。尽管很简单，但在三个常用的嵌套NER数据集中进行的实验表明，我们的模型超过了几种具有相同预训练的编码器的最近提出的方法。进一步的分析表明，使用CNN可以帮助模型更准确地找到嵌套实体。此外，我们发现不同的论文对三个嵌套的NER数据集使用了不同的句子引导，这将影响比较。因此，我们发布了一个预处理脚本，以促进将来的比较。

translated by 谷歌翻译

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

Xiaomin Fang , Fan Wang , Lihang Liu , Jingzhou He , Dayong Lin , Yingfei Xiang , Xiaonan Zhang , Hua Wu , Hui Li , Le Song

分类：人工智能 | 机器学习

2022-07-28

基于AI的蛋白质结构预测管道（例如AlphaFold2）已达到了几乎实验的准确性。这些高级管道主要依赖于多个序列比对（MSA）和模板作为输入来从同源序列中学习共进化信息。但是，从蛋白质数据库中搜索MSA和模板很耗时，通常需要数十分钟。因此，我们尝试通过仅使用蛋白质的主要序列来探索快速蛋白质结构预测的极限。提出了Helixfold单一的形式将大规模蛋白质语言模型与AlphaFold2的优质几何学习能力相结合。我们提出的方法，Helixfold单个，首先预先培训是一种大规模蛋白质语言模型（PLM），使用了数以千计的主要序列利用自我监督的学习范式，将用作MSA和模板的替代方法共同进化信息。然后，通过将预训练的PLM和AlphaFold2的必需组件组合在一起，我们获得了一个端到端可区分模型，以仅从主要序列预测原子的3D坐标。 Helixfold-Single在数据集CASP14和Cameo中得到了验证，通过基于MSA的方法，具有大型同源家庭的基于MSA的方法，从而实现了竞争精度。此外，与主流管道进行蛋白质结构预测相比，Helixfold单个的时间比主流管道的时间少得多，这表明其在需要许多预测的任务中的潜力。 HelixFold-Single的守则可在https://github.com/paddlepaddle/paddlehelix/tree/dev/dev/pprotein_folding/helixfold-single上获得，我们还在https://paddlehelix.baidu.com上提供稳定的Web服务。 /app/drug/protein-single/prevast。

translated by 谷歌翻译

Neural Points: Point Cloud Representation with Neural Fields

Wanquan Feng , Jin Li , Hongrui Cai , Xiaonan Luo , Juyong Zhang

分类：计算机视觉

2021-12-08

在本文中，我们提出了一种新的点云表示。与传统点云表示不同，其中每个点仅表示3D空间中的位置或局部平面，神经点中的每个点通过神经领域表示局部连续几何形状。因此，神经点可以表达更复杂的细节，因此具有更强的表示能力。具有含有丰富的几何细节的高分辨率表面培训神经点，使得训练模型具有足够的各种形状的表达能力。具体地，我们通过2D参数域和3D本地补丁之间的局部同构来提取点上的深度局部特征并通过局部同构构造神经字段。在决赛中，局部神经领域集成在一起以形成全局表面。实验结果表明，神经点具有强大的代表能力，展示了优异的鲁棒性和泛化能力。通过神经点，我们可以用任意分辨率重新采样点云，并优于最先进的点云上采样方法，通过大边距。

translated by 谷歌翻译

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Chenglong Li , Tianhao Zhu , Lei Liu , Xiaonan Si Zilin Fan , Sulan Zhai

分类：计算机视觉

2021-11-08

在许多可视化系统中，视觉跟踪通常基于RGB图像序列，其中一些目标在低光条件下无效，因此追踪性能显着影响。介绍深度和红外数据等其他模态是处理单个来源的成像限制的有效方法，但多模态成像平台通常需要详细设计，并且目前不能应用于许多现实世界应用。近红外（NIR）成像成为许多监视摄像机的重要组成部分，其成像基于光强度在RGB和NIR之间切换。这两种方式具有异质性，视觉特性非常不同，因此为视觉跟踪带来了大量挑战。但是，现有的作品没有研究过这个具有挑战性的问题。在这项工作中，我们解决了跨模型对象跟踪问题并贡献新的视频数据集，包括总共具有超过481K帧的654个跨模型图像序列，并且平均视频长度超过735帧。为促进跨模型对象跟踪的研究和开发，我们提出了一种新的算法，它学习模态感知目标表示，以减轻跟踪过程中RGB和NIR模式之间的外观差距。它是即插即用，因此可以灵活地嵌入到不同的跟踪框架中。对数据集进行广泛的实验，我们展示了两个代表性跟踪框架中提出的算法的有效性，其针对17个最先进的跟踪方法。我们将发布数据集进行免费学术用法，数据集下载链接和代码即将发布。

translated by 谷歌翻译

An invertible crystallographic representation for general inverse design of inorganic crystals with targeted properties

Zekun Ren , Siyu Isaac Parker Tian , Juhwan Noh , Felipe Oviedo , Guangzong Xing , Jiali Li , Qiaohao Liang , Ruiming Zhu , Armin G. Aberle , Shijing Sun

分类：机器学习

2020-05-15

实现一般逆设计可以通过用户定义的属性极大地加速对新材料的发现。然而，最先进的生成模型往往限于特定的组成或晶体结构。这里，我们提出了一种能够一般逆设计的框架（不限于给定的一组元件或晶体结构），其具有在实际和往复空间中编码晶体的广义可逆表示，以及来自变分的属性结构潜空间autoencoder（vae）。在三种设计情况下，该框架通过用户定义的形成能量，带隙，热电（TE）功率因数和组合产生142个新晶体。在训练数据库中缺席的这些生成的晶体通过第一原理计算验证。成功率（验证的第一原理验证的目标圆形晶体/数量的设计晶体）范围为7.1％和38.9％。这些结果表示利用生成模型朝着性质驱动的一般逆设计的重要步骤，尽管在与实验合成结合时仍然存在实际挑战。

translated by 谷歌翻译

Attentional Graph Convolutional Network for Structure-aware Audio-Visual Scene Classification

Liguang Zhou , Yuhongze Zhou , Xiaonan Qi , Junjie Hu , Tin Lun Lam , Yangsheng Xu

分类：计算机视觉

2022-12-31

Audio-Visual scene understanding is a challenging problem due to the unstructured spatial-temporal relations that exist in the audio signals and spatial layouts of different objects and various texture patterns in the visual images. Recently, many studies have focused on abstracting features from convolutional neural networks while the learning of explicit semantically relevant frames of sound signals and visual images has been overlooked. To this end, we present an end-to-end framework, namely attentional graph convolutional network (AGCN), for structure-aware audio-visual scene representation. First, the spectrogram of sound and input image is processed by a backbone network for feature extraction. Then, to build multi-scale hierarchical information of input features, we utilize an attention fusion mechanism to aggregate features from multiple layers of the backbone network. Notably, to well represent the salient regions and contextual information of audio-visual inputs, the salient acoustic graph (SAG) and contextual acoustic graph (CAG), salient visual graph (SVG), and contextual visual graph (CVG) are constructed for the audio-visual scene representation. Finally, the constructed graphs pass through a graph convolutional network for structure-aware audio-visual scene recognition. Extensive experimental results on the audio, visual and audio-visual scene recognition datasets show that promising results have been achieved by the AGCN methods. Visualizing graphs on the spectrograms and images have been presented to show the effectiveness of proposed CAG/SAG and CVG/SVG that could focus on the salient and semantic relevant regions.

translated by 谷歌翻译

A Transformer-Based User Satisfaction Prediction for Proactive Interaction Mechanism in DuerOS

Wei Shen , Xiaonan He , Chuheng Zhang , Xuyun Zhang , Jian XIe

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-05

Recently, spoken dialogue systems have been widely deployed in a variety of applications, serving a huge number of end-users. A common issue is that the errors resulting from noisy utterances, semantic misunderstandings, or lack of knowledge make it hard for a real system to respond properly, possibly leading to an unsatisfactory user experience. To avoid such a case, we consider a proactive interaction mechanism where the system predicts the user satisfaction with the candidate response before giving it to the user. If the user is not likely to be satisfied according to the prediction, the system will ask the user a suitable question to determine the real intent of the user instead of providing the response directly. With such an interaction with the user, the system can give a better response to the user. Previous models that predict the user satisfaction are not applicable to DuerOS which is a large-scale commercial dialogue system. They are based on hand-crafted features and thus can hardly learn the complex patterns lying behind millions of conversations and temporal dependency in multiple turns of the conversation. Moreover, they are trained and evaluated on the benchmark datasets with adequate labels, which are expensive to obtain in a commercial dialogue system. To face these challenges, we propose a pipeline to predict the user satisfaction to help DuerOS decide whether to ask for clarification in each turn. Specifically, we propose to first generate a large number of weak labels and then train a transformer-based model to predict the user satisfaction with these weak labels. Empirically, we deploy and evaluate our model on DuerOS, and observe a 19% relative improvement on the accuracy of user satisfaction prediction and 2.3% relative improvement on user experience.

translated by 谷歌翻译

Real2Sim2Real Transfer for Control of Cable-driven Robots via a Differentiable Physics Engine

Kun Wang , William R. Johnson III , Shiyang Lu , Xiaonan Huang , Joran Booth , Rebecca Kramer-Bottiglio , Mridul Aanjaneya , Kostas Bekris

分类：机器人 | 人工智能 | 机器学习

2022-09-13

紧张的机器人由刚性杆和柔性电缆组成，表现出高强度对重的比率和极端变形，使它们能够驾驭非结构化的地形，甚至可以在严酷的冲击力上生存。但是，由于其高维，复杂的动态和耦合体系结构，它们很难控制。基于物理学的仿真是制定运动策略的途径，然后可以将其转移到真实的机器人中，但是建模时态机器人是一项复杂的任务，因此模拟会经历大量的SIM2REAL间隙。为了解决这个问题，本文介绍了台词机器人的真实2SIM2REAL策略。该策略是基于差异物理引擎的，可以在真正的机器人（即离线测量和一个随机轨迹）中进行有限的数据进行训练，并达到足够高的精度以发现可转移的运动策略。除了整体管道之外，这项工作的主要贡献包括在接触点处计算非零梯度，损失函数和轨迹分割技术，该技术避免了训练期间梯度评估的冲突。在实际的3杆张力机器人上证明并评估了所提出的管道。

translated by 谷歌翻译

Binary Representation via Jointly Personalized Sparse Hashing

Xiaoqin Wang , Chen Chen , Rushi Lan , Licheng Liu , Zhenbing Liu , Huiyu Zhou , Xiaonan Luo

分类：计算机视觉

2022-08-31

由于需要经济的储存和二元法规的效率，因此无监督的哈希对二元表示学习引起了很多关注。它旨在编码锤子空间中的高维特征，并在实例之间保持相似性。但是，大多数现有方法在基于多种的方法中学习哈希功能。这些方法捕获了数据的局部几何结构（即成对关系），并且在处理具有不同语义信息的实际特征（例如颜色和形状）的真实情况时缺乏令人满意的性能。为了应对这一挑战，在这项工作中，我们提出了一种有效的无监督方法，即共同个性化的稀疏哈希（JPSH），以进行二进制表示学习。具体来说，首先，我们提出了一个新颖的个性化哈希模块，即个性化的稀疏哈希（PSH）。构建了不同的个性化子空间，以反映不同群集的特定类别属性，同一群集中的自适应映射实例与同一锤子空间。此外，我们为不同的个性化子空间部署稀疏约束来选择重要功能。我们还收集了其他群集的优势，以避免过度拟合，以构建PSH模块。然后，为了在JPSH中同时保留语义和成对的相似性，我们将基于PSH和歧管的哈希学习纳入无缝配方中。因此，JPSH不仅将这些实例与不同的集群区分开，而且还保留了集群中的本地邻里结构。最后，采用了交替优化算法，用于迭代捕获JPSH模型的分析解决方案。在四个基准数据集上进行的大量实验验证了JPSH是否在相似性搜索任务上优于几个哈希算法。

translated by 谷歌翻译

HTML版本